Writing a custom dataset

This notebook will walk you through implementing a custom iterator for a modified version of the Street View House Number (SVHN) dataset. You will then design a network to train on this dataset.

SVHN dataset

This dataset is a collection of 73,257 images of house numbers collected from Google Streetview. The original dataset has bounding boxes for all the digits in the image:

We have modified the dataset such that each image is 64x64 pixels (with 3 color channels), and the target is a single bounding box over all the digits. Your goal is to build a network that, given an image, returns bounding box coordinates for the location of the digit sequence.

This notebook is split into two parts:

  • Writing a custom dataiterator
  • Building a prediction network

Custom dataset

Because the training set of ~27,000 images can fit into the memory of a single Titan X GPU, we could use the ArrayIterator class to provide data to the model. However, when the dataset may have more images or larger image sizes, that is no longer an option. Our high-performance DataLoader, which loads image in batches and performs complex augmentation, cannot currently handle bounding box data (stay tuned, an object localization dataloader is coming in a future neon release!).

We've saved the dataset as a pickle file svhn_64.p. This file has a few variables:

  • X_train: a numpy array of shape (num_examples, num_features), where num_examples = 26624, and num_features = 3*64*64 = 12288
  • y_train: a numpy array of shape (num_examples, 4), with the target bounding box coordinates in (x_min, y_min, w, h) format.
  • X_test: a numpy array of shape (3328, 12288)
  • y_test: a numpy array of shape (3328, 4)

Let's first import our backend:


In [ ]:
from neon.backends import gen_backend

be = gen_backend(batch_size=128, backend='gpu')

# set the debug level to 10 (the minimum)
# to see all the output
import logging
main_logger = logging.getLogger('neon')
main_logger.setLevel(10)

Next, we load the pickle file with our SVHN dataset.


In [ ]:
import cPickle

fileName = '../data/svhn_64.p'
print("Loading {}...".format(fileName))

with open(fileName) as f:
    svhn = cPickle.load(f)

Below is a skeleton of the SVHN data iterator for you to fill out, with notes to help along the way. The goal is an object that returns, with each call, a tuple of (X, Y) for the input and the target bounding boxes.


In [ ]:
# import some useful packages
from neon.data import NervanaDataIterator
import numpy as np
import cPickle
import os

class SVHN(NervanaDataIterator):

    def __init__(self, X, Y, lshape):

        # Load the numpy data into some variables. We divide the image by 255 to normalize the values
        # between 0 and 1.
        self.X = X / 255.
        self.Y = Y
        self.shape = lshape  # shape of the input data (e.g. for images, (C, H, W))

        # 1. assign some required and useful attributes
        self.start = 0  # start at zero
        self.ndata = self.X.shape[0]  # number of images in X (hint: use X.shape)
        self.nfeatures = self.X.shape[1]  # number of features in X (hint: use X.shape)

        # number of minibatches per epoch
        # to calculate this, use the batchsize, which is stored in self.be.bsz
        self.nbatches = self.ndata/self.be.bsz 
        
        
        # 2. allocate memory on the GPU for a minibatch's worth of data.
        # (e.g. use `self.be` to access the backend.). See the backend documentation.
        # to get the minibatch size, use self.be.bsz
        # hint: X should have shape (# features, mini-batch size)
        # hint: use some of the attributes previously defined above
        self.dev_X = self.be.zeros((self.nfeatures, self.be.bsz))
        self.dev_Y = self.be.zeros((self.Y.shape[1], self.be.bsz))


    def reset(self):
        self.start = 0

    def __iter__(self):
        # 3. loop through minibatches in the dataset
        for index in range(self.start, self.ndata, self.be.bsz):
            # 3a. grab the right slice from the numpy arrays
            inputs = self.X[index:(index + self.be.bsz), :]
            targets = self.Y[index:(index + self.be.bsz), :]
            
            # The arrays X and Y data are in shape (batch_size, num_features),
            # but the iterator needs to return data with shape (num_features, batch_size).
            # here we transpose the data, and then store it as a contiguous array. 
            # numpy arrays need to be contiguous before being loaded onto the GPU.
            inputs = np.ascontiguousarray(inputs.T)
            targets = np.ascontiguousarray(targets.T)
                        
            # here we test your implementation
            # your slice has to have the same shape as the GPU tensors you allocated
            assert inputs.shape == self.dev_X.shape, \
                   "inputs has shape {}, but dev_X is {}".format(inputs.shape, self.dev_X.shape)
            assert targets.shape == self.dev_Y.shape, \
                   "targets has shape {}, but dev_Y is {}".format(targets.shape, self.dev_Y.shape)
            
            # 3b. transfer from numpy arrays to device
            # - use the GPU memory buffers allocated previously,
            #    and call the myTensorBuffer.set() function. 
            self.dev_X.set(inputs)
            self.dev_Y.set(targets)
            
            # 3c. yield a tuple of the device tensors.
            # X should be of shape (num_features, batch_size)
            # Y should be of shape (4, batch_size)
            yield (self.dev_X, self.dev_Y)

Check your implementation! Below we grab an iteration and print out the output of the dataset. Importantly: make sure that the output tensors are contiguous (e.g. is_contiguous = True in the output below). This means that they are allocated on a contiguous set of memory, which is important for the downstream calculations. Contiguity can be broken by operations like transpose.


In [ ]:
# setup datasets
train_set = SVHN(X=svhn['X_train'], Y=svhn['y_train'], lshape=(3, 64, 64))

# grab one iteration from the train_set
iterator = train_set.__iter__()
(X, Y) = iterator.next()
print X  # this should be shape (12288, 128)
print Y  # this should be shape (4, 128)
assert X.is_contiguous
assert Y.is_contiguous

If all goes well, you are ready to try training on this network! First, let's reset the dataset to zero (since you drew one example from above). We also add a test set for evaluation.


In [ ]:
train_set.reset()

# generate test set
test_set = SVHN(X=svhn['X_test'], Y=svhn['y_test'], lshape=(3, 64, 64))

Model architecture

We recommend using a VGG-style convolutional neural network to train this model, using the ConvNet Design Philosophy we introduced earlier. We've imported some relevant packages that you may want to use, and have some guiding steps for implementing your network. Experiment with networks of different sizes!

Some tips:

  • Training a model for 10 epochs should take 30s/epoch. If you are taking longer than that, your network is too large.
  • Compare the training set cost and the validation set loss to make sure you are not overfitting on the data.
  • Try to get a validation set loss of ~220 after 10 epochs

In [ ]:
from neon.callbacks.callbacks import Callbacks
from neon.initializers import Gaussian
from neon.layers import GeneralizedCost, Affine, Conv, Pooling, Linear, Dropout
from neon.models import Model
from neon.optimizers import GradientDescentMomentum, RMSProp
from neon.transforms import Rectlin, Logistic, CrossEntropyMulti, Misclassification, SumSquared

init_norm = Gaussian(loc=0.0, scale=0.01)

# set up model layers
conv = dict(init=init_norm, batch_norm=True, activation=Rectlin())
convp1 = dict(init=init_norm, batch_norm=True, activation=Rectlin(), padding=1)

layers = [Conv((3, 3, 64), **convp1),  # 64x64 feature map
          Conv((3, 3, 64), **convp1),
          Pooling((2, 2)),
          Dropout(keep=.5),
          Conv((3, 3, 96), **convp1),  # 32x32 feature map
          Conv((3, 3, 96), **convp1),
          Pooling((2, 2)),
          Dropout(keep=.5),
          Conv((3, 3, 128), **convp1),  # 16x16 feature map
          Conv((3, 3, 128), **convp1),
          Pooling((2, 2)),
          Dropout(keep=.5),
          Conv((3, 3, 192), **convp1),  # 8x8 feature map
          Conv((1, 1, 192), **conv),
          Linear(nout=4, init=init_norm)] # last layer good for bbox

# use SumSquared cost
cost = GeneralizedCost(costfunc=SumSquared())

# setup optimizer
optimizer = RMSProp()

# initialize model object
mlp = Model(layers=layers)

# configure callbacks
callbacks = Callbacks(mlp, eval_set=test_set, output_file='data.h5', eval_freq=1)

# run fit
mlp.fit(train_set, optimizer=optimizer, num_epochs=10, cost=cost, callbacks=callbacks)

Below we plot the cost data over time to help you visualize the training progress. This is similiar to using the nvis command line tool to generate plots.


In [ ]:
from neon.visualizations.figure import cost_fig, hist_fig, deconv_summary_page
from neon.visualizations.data import h5_cost_data, h5_hist_data, h5_deconv_data
from bokeh.plotting import output_notebook, show

cost_data = h5_cost_data('data.h5', False)
output_notebook()
show(cost_fig(cost_data, 300, 600, epoch_axis=False))

To understand how the network performed, we sample images and plot the network's predicted bounding box against the ground truth bounding box. We evaluate this on the test_set, which was not used to train the network.


In [ ]:
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt

# get a minibatch's worth of
# inputs (X) and targets (T)
iterator = test_set.__iter__()
(X, T) = iterator.next()

# fprop the input to get the model output
y = mlp.fprop(X)

# transfer from device to numpy arrays
y = y.get()
T = T.get()

Our ground truth box T and the model prediction y are both arrays of size (4, batch_size). We can plot an image below. Feel free to modify i to check performance on various test images. Red boxes are the model's guess, and blue boxes are the ground truth boxes.


In [ ]:
plt.figure(2)
imgs_to_plot = [0, 1, 2, 3]
for i in imgs_to_plot:
    plt.subplot(2, 2, i+1)

    title = "test {}".format(i)
    plt.imshow(X.get()[:, i].reshape(3, 64, 64).transpose(1, 2, 0))
    ax = plt.gca()
    ax.add_patch(plt.Rectangle((y[0,i], y[1,i]), y[2,i], y[3,i], fill=False, edgecolor="red")) # model guess
    ax.add_patch(plt.Rectangle((T[0,i], T[1,i]), T[2,i], T[3,i], fill=False, edgecolor="blue")) # ground truth
    plt.title(title)
    plt.axis('off')

In [ ]:
i=0
print "Target box had coordinates: {}".format(T[:,i])
print "Model prediction has coordinates: {}".format(y[:, i])

In [ ]: